Clustering Multivariate Normal Distributions
نویسندگان
چکیده
In this paper, we consider the task of clustering multivariate normal distributions with respect to the relative entropy into a prescribed number, k, of clusters using a generalization of Lloyd’s k-means algorithm [1]. We revisit this information-theoretic clustering problem under the auspices of mixed-type Bregman divergences, and show that the approach of Davis and Dhillon [2] (NIPS*06) can also be derived directly, by applying the Bregman k-means algorithm, once the proper vector/matrix Legendre transformations are defined. We further explain the dualistic structure of the sided k-means clustering, and present a novel k-means algorithm for clustering with respect to the symmetrical relative entropy, the J-divergence. Our approach extends to differential entropic clustering of arbitrary members of the same exponential families in statistics.
منابع مشابه
Robust Fuzzy Classification Maximum Likelihood Clustering with Multivariate t-Distributions
Mixtures of distributions have been used as probability models for clustering data. Classification maximum likelihood (CML) procedure is a popular mixture of maximum likelihood approach to clustering. Yang (1993) extended CML to fuzzy CML (FCML) for a normal mixture model, called FCML-N. However, normal distributions are not robust for outliers. In general, t-distributions should be more robust...
متن کاملA Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions
Clustering analysis based on a mixture of multivariate normal distributions is commonly used in the clustering of multidimensional data sets. Model selection is one of the most important problems in mixture cluster analysis based on the mixture of multivariate normal distributions. Model selection involves the determination of the number of components (clusters) and the selection of an appropri...
متن کاملComparing Mean Vectors Via Generalized Inference in Multivariate Log-Normal Distributions
Abstract In this paper, we consider the problem of means in several multivariate log-normal distributions and propose a useful method called as generalized variable method. Simulation studies show that suggested method has a appropriate size and power regardless sample size. To evaluation this method, we compare this method with traditional MANOVA such that the actual sizes of the two methods ...
متن کاملPattern Clustering by Multivariate Mixture Analysis.
Cluster analysis is reformulated as a problem of estimating the para- meters of a mixture of multivariate distributions. The maximum-likelihood theory and numerical solution techniques are developed for a fairly general class of distributions. The theory is applied to mixtures of multivariate nor- mals (NORMIX) and mixtures of multivariate Bernoulli distributions (Latent Classes). The feasibili...
متن کاملRejoinder to the discussion of "Model-based clustering and classification with non-normal mixture distributions"
Non-normal mixture distributions have received increasing attention in recent years. Finite mixtures of multivariate skew-symmetric distributions, in particular, the skew normal and skew t-mixture models, are emerging as promising extensions to the traditional normal and t-mixture models. Most of these parametric families of skew distributions are closely related, and can be classified into fou...
متن کاملOptimal clustering of multivariate normal distributions using divergence and its application to HMM adaptation
We present an optimal clustering algorithm for grouping multivariate normal distributions into clusters using the divergence, a symmetric, information-theoretic distortion measure based on the Kullback-Liebler distance. Optimal solutions for normal distributions are shown to he obtained by solving a set of Riccati matrix equations and the optimal centroids are found by altemating the mean and c...
متن کامل